Graphing in R

Dataset

dat <- na.omit(read.csv("BirdNest.csv"))

R has different plotting frameworks/systems

  1. Base R graphics - included in R installation AND already attached
    • basic plot function plus a lot of other functions which add stuff to the plot
    • still used a lot in other packages esp those where visualization is not the main goal, e.g. some Bioconductor packages
  2. lattice package - included in R installation
    • not as popular, but older than grid, so a lot of built-in R functions still use lattice.
  3. grid graphics package - included in R installation
    • supplements base graphics with a more flexible layout system and object definition
    • forms the basis of many (most?) other graphing packages
  4. ggplot2 package - installed separately
    • based on grid graphics, most popular package, though base graphics are catching up!
    • There are lots of other packages which build off of ggplot2
    • After this intro, we will spend most time in this first class using ggplot

The textbook http://www.datavisualisation-r.com/ goes over 1-3, so we will focus on 4 in this class.

Base R plot function

?plot
## Help on topic 'plot' was found in the following packages:
## 
##   Package               Library
##   graphics              /usr/local/Cellar/r/4.1.3/lib/R/library
##   base                  /usr/local/Cellar/r/4.1.3/lib/R/library
## 
## 
## Using the first match ...
plot(dat$Length, dat$No.eggs)

plot(dat$Length, dat$No.eggs, type = "n")
points(dat$Length, dat$No.eggs, cex = .5, col = "dark red")
title("my base R plot")

dat$plotcolors <- ifelse(dat$Color == 1, "red", "blue")
plot(dat$Length, dat$No.eggs, col=dat$plotcolors)

ggplot2

  • Grammar of graphics: the construction of the plot is driven by the data
library(ggplot2)
  1. Start with the ggplot function.

Aesthetics are components of the graph. We will be specifying aesthetic mappings mapping from columns in the data to components of the graph.

ggplot(data = dat, mapping = aes(x = Length, y = No.eggs))

  1. Add a geom.

geom functions specify the type of graph (points, lines, etc) to add to the plot.

ggplot(data = dat, mapping = aes(x = Length, y = No.eggs)) + geom_point()

  1. Add a scale. Suppose we want to plot the log of the x-axis.

When would we use the scale_* transformation functions and when would we actually calculate the transformed values?

dat$loglength <- log10(dat$Length)
ggplot(data = dat, aes(x = loglength, y = No.eggs)) + geom_point()

ggplot(data = dat, mapping = aes(x = Length, y = No.eggs)) + geom_point() + scale_x_log10()

breaks change the tickmarks/appearance of the plot

ggplot(data = dat, mapping = aes(x = Length, y = No.eggs)) + geom_point() + scale_x_log10(breaks = c(10, 15, 20, 30))

limits change the actual information of the plot

ggplot(data = dat, mapping = aes(x = Length, y = No.eggs)) + geom_point() + scale_x_log10(limits = c(10, 20))
## Warning: Removed 19 rows containing missing values (geom_point).

Colors - another aesthetic mapping

ggplot(data = dat, aes(x = Length, y = No.eggs, color = plotcolors)) + geom_point()

ggplot(data = dat, aes(x = Length, y = No.eggs, color = Closed.)) + geom_point()

More on scales

  • ggplot chooses the type of scale to use for an aesthetic based on the class of data it is mapping to that aesthetic.
    • characters and (most of the time) factors are usually discrete scales.
    • numerical values are usually continuous (mapping to an axis) or gradient (mapping to a color) scales.
  • try a discrete value with a continuous scale
## ggplot(data = dat, aes(x = Species, y = No.eggs)) + geom_point() + scale_x_continuous()

Continuous scale colors

  • we will use Totcare instead of Closed. to have a wider range of values
ggplot(data = dat, aes(x = Length, y = No.eggs, color = Totcare)) + geom_point() + scale_colour_gradient(low = "darkblue", high = "hotpink")

breaks change the tickmarks/appearance of the plot

ggplot(data = dat, aes(x = Length, y = No.eggs, color = Totcare)) + geom_point() + scale_colour_gradient(low = "darkblue", high = "hotpink", breaks = c(20, 30, 40))

change the limits; limits change the actual information in the plot

ggplot(data = dat, aes(x = Length, y = No.eggs, color = Totcare)) + geom_point() + scale_colour_gradient(low = "darkblue", high = "hotpink", breaks = c(20, 30, 40), limits = c(19, 40))

ggplot(data = dat, aes(x = Length, y = No.eggs, color = Totcare)) + geom_point() + scale_colour_gradient(low = "darkblue", high = "hotpink", breaks = c(20, 30, 40), limits = c(19, 30))

name

ggplot(data = dat, aes(x = Length, y = No.eggs, color = Totcare)) + geom_point() + scale_colour_gradient(low = "darkblue", high = "hotpink", breaks = c(20, 30, 40), limits = c(19, 30), name = "Total care")

Discrete scale colors

ggplot(data = dat, aes(x = Length, y = No.eggs, color = Nesttype)) + geom_point() 

plotcolors <- c("red", "purple", "darkorange", "brown", "green", "blue", "yellow")
ggplot(data = dat, aes(x = Length, y = No.eggs, color = Nesttype)) + geom_point() + scale_color_manual(values = plotcolors)

limits

newlimits <- c("cup", "cavity", "saucer", "burrow", "spherical", "pendant",  "crevice")
ggplot(data = dat, aes(x = Length, y = No.eggs, color = Nesttype)) + geom_point() + scale_color_manual(values = plotcolors, limits = newlimits)

newlimits1 <- c( "cavity", "saucer", "burrow", "spherical", "pendant",  "crevice")
ggplot(data = dat, aes(x = Length, y = No.eggs, color = Nesttype)) + geom_point() + scale_color_manual(values = plotcolors, limits = newlimits1)

breaks overlap limits in discrete scale - recommend just picking one. labels - not connected to the data - have to keep track yourself!

# ggplot(data = dat, aes(x = Length, y = No.eggs, color = Nesttype)) + geom_point() + scale_color_manual(values = plotcolors, labels = newlimits)

ggplot(data = dat, aes(x = Length, y = No.eggs, color = Nesttype)) + geom_point() + scale_color_manual(values = plotcolors, labels = toupper)

Change name of legend

ggplot(data = dat, aes(x = Length, y = No.eggs, color = Nesttype)) + geom_point() + scale_color_manual(values = plotcolors, limits = newlimits, labels = toupper, name = "Type of nest")

Saving plots

Save most recent plot to file

ggsave("plot1.pdf")
## Saving 7 x 5 in image

Can also assign variable name to plot and then save that.

myplot <- ggplot(data = dat, aes(x = Length, y = No.eggs, color = Nesttype)) + geom_point() + scale_color_manual(values = plotcolors, limits = newlimits, labels = toupper, name = "Type of nest")
ggsave(file = "plot2.png", myplot)
## Saving 7 x 5 in image

Geoms

Change geom to change type of plot

ggplot(data = dat, aes(x = Species, y = Totcare)) + geom_col()

ggplot(data = dat, aes(x = Species, y = Totcare, fill = Nesttype)) + geom_col() + scale_fill_manual(values = plotcolors, labels = toupper)

# geom col has a fill aesthetic not color like geom point
?geom_col 

Long vs wide data

Suppose we want to plot a Stacked bar graph breaking down the length of time the birds spend caring for the young.

ggplot expects long data. Long data is where each column corresponds to a single variable, so you can match up variables to aesthetics. Use pivot_longer from tidyr (or melt from reshape2 - though reshape2 is no longer under active development)

?tidyr::pivot_longer
long <- tidyr::pivot_longer(data = dat, cols = c("Incubate", "Nestling"), names_to = "Caretype", values_to = "Time")

Now we can color our bars based on the Caretype

ggplot(long, aes(x=Species, y=Time, fill=Caretype)) + geom_col()

Unstacked barplot using position dodge or fill. We use first 10 rows so our plots will be smaller as we add more components.

longsub <- long[1:10,]
ggplot(longsub, aes(x=Species, y=Time, fill=Caretype)) + geom_col(position = "dodge")

ggplot(longsub, aes(x=Species, y=Time, fill=Caretype)) + geom_col(position = "fill")

Theme

Can make cosmetic changes with theme. text angle, justification, size, background colors, etc

?theme
?element_text
ggplot(longsub, aes(x=Species, y=Time, fill=Caretype)) + geom_col(position = "fill") + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Save your theme to a variable, so you can re-use it over and over.

mytheme <- theme_light() + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggplot(longsub, aes(x=Species, y=Time, fill=Caretype)) + geom_col(position = "fill") + mytheme

Facets

For pie graph add polar coordinates mapping y to angle

  • try different geom_col position
ggplot(longsub, aes(x=Species, y=Time, fill=Caretype)) + geom_col() + coord_polar(theta="y")

ggplot(longsub, aes(x=Species, y=Time, fill=Caretype)) + geom_col(position = "fill") + coord_polar(theta="y")

plot single pie with x=0

ggplot(longsub[1:2,], aes(x=0, y=Time, fill=Caretype)) + geom_col(position = "fill") + coord_polar(theta="y")

plot little pies with facet_wrap

ggplot(longsub, aes(x=0, y=Time, fill=Caretype)) + geom_col(position = "fill") + coord_polar(theta="y") + facet_wrap(facets = vars(Species)) + mytheme